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Abstract 

More than 70 years after the first ex situ genebanks have been established, major efforts in this field are still con- 
cerned with issues related to further completion of individual collections and securing of their storage. Attempts 
regarding valorization of ex situ collections for plant breeders have been hampered by the limited availability of 
phenotypic and genotypic information. With the advent of molecular marker technologies first efforts were made 
to fingerprint genebank accessions, albeit on a very small scale and mostly based on inadequate DNA marker sys- 
tems. Advances in DNA sequencing technology and the development of high-throughput systems for multiparallel 
interrogation of thousands of single nucleotide polymorphisms (SNPs) now provide a suite of technological platforms 
facilitating the analysis of several hundred of Gigabases per day using state-of-the-art sequencing technology or, 
at the same time, of thousands of SNPs. The present review summarizes recent developments regarding the deploy- 
ment of these technologies for the analysis of plant genetic resources, in order to identify patterns of genetic diver- 
sity, map quantitative traits and mine novel alleles from the vast amount of genetic resources maintained in 
genebanks around the world. It also refers to the various shortcomings and bottlenecks that need to be overcome 
to leverage the full potential of high-throughput DNA analysis for the targeted utilization of plant genetic resources. 
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INTRODUCTION 

Plant breeding needs to focus on traits with the 
greatest potential to increase yield under changing 
climate conditions [1]. Agricultural practices have 
gradually displaced local traditional varieties and 
crop wild relatives, leading to a dramatic loss of in- 
digenous biodiversity. Tapping into the rich genetic 
diversity inherent in a crop species and their wild 
relatives is a prerequisite for germplasm improvement 
in the future [2-7; http://www.fao.org]. Hence, 
new technologies must be developed to accelerate 
breeding through improving genotyping and pheno- 
typing methods and by accessing the available gen- 
etic diversity stored in genebanks around the world. 



Prior to the advent of molecular characterization, 
accessions in germplasm collections were mainly 
examined based on morphological characters and 
phenotypic traits [8] . The development of molecu- 
lar techniques now allows a more accurate analysis 
of large collections. High-throughput (HT) technol- 
ogies including DNA isolation, genotyping, pheno- 
typing and next-generation sequencing (NGS) 
provide new tools to add substantial value to gene- 
bank collections. The integration of genomic 
data into genebank documentation systems and its 
combination with taxonomic, phenotypic and eco- 
logical data will usher in a new era for the valoriza- 
tion of plant genetic resources (PGR). From the 
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Hordeum Accessions 

Figure I: Ex situ collections are dominated by major crop species. (A) Of the more than 3000 crop species that are 
maintained ex situ, 10 species totaling 3 540 000 accessions represent about half of the global inventory of ex situ re- 
sources amounting to 7.4 million. (B) Correlation of the aggregated size of the ex situ collections the acreage fetched 
by the individual crop species. 



determination of phenotypic traits to the application 
of NGS to whole genomes, every aspect of genom- 
ics will have a great impact not only on PGR con- 
servation, but also on their utilization in plant 
breeding [9]. 

Identification and tracking of genetic variation has 
become so efficient and precise that thousands of 
candidate genes can be tracked within large gene- 
bank collections [10]. Using NGS technologies, it is 
possible to resequence candidate genes, entire tran- 
scriptomes or entire plant genomes more efficiently 
and economically than ever before. Advances in 
sequencing technology will allow for whole-genome 
resequencing of hundreds of individuals. In this way, 
information on thousands of candidate genes and 
candidate regions can be harnessed for thousands of 
individuals to sample genetic diversity within and 
between germplasm pools, to map Quantitative 
Trait Loci (QTLs), to identify individual genes and 
to determine their functional diversity. In this 
review, we outline some important developments 
in this field, where NGS technologies are expected 
to enhance the value and thus the usefulness of gen- 
ebank collections. 



STATE OF EX SITU GERMPLASM 
RESOURCES 

PGR include cultivars, landraces, crop wild relatives 
and mutants. The loss of genetic diversity in many 



crop plants has resulted in efforts to collect PGR 
which were initiated by Vavilov early in the 20th 
century aiming at supporting plant breeders with 
genetic material to extend genetic variability, as a 
basis to create new crop varieties [11]. A wealth 
of germplasm collections is available worldwide, 
with more than 7 million accessions held in over 
1.700 genebanks (http://www.fao.org/docrep/013/ 
il500e/il 500e00.htm). These do not evenly cover 
all crop species but are highly biased regarding their 
agricultural importance. About 50% of the global ex 
situ germplasm is made up by only 10 crop species 
with the three largest collections (wheat, rice and 
barley) representing 28% of the global germplasm 
(Figure 1). Passport and genotypic data suggest that 
collections include different degrees of duplications 
resulting in ^1.9—2.2 million distinct accessions 
with the remaining being duplicates (http://www 
.fao.org/docrep/013/il500e/il500e00.htm). Proper 
conservation of PGR along with the development 
of best genebank practices and pomoting the effect- 
ive use is vital for food security in the future [12]. 
However, ex situ conservation is rather fragmented, 
largely because it is mainly based on national pro- 
grams and scattered institutional efforts. For instance, 
barley (Hordeum vulgare L.), is maintained in more 
than 200 collections worldwide amounting to ap- 
proximately 470 000 accessions [13]. Other crop spe- 
cies follow similar patterns [14]. Despite manifold 
efforts to coordinate genebank activities conservation 



40 



Kilian and Graner 



is still inefficient in many places and suffers from 
variable or even lacking standards, unreliable access 
and poor characterization and documentation of the 
material [15]. Ex situ germplasm collections for crop 
wild relatives are rather limited in size due to the 
difficulties in maintaining non-domesticated plants 
[16]. Introgression from wild to cultivated germ- 
plasm and vice versa both during seed multiplication 
in genebanks as well as in the wild pose a problem 
for proper maintenance and correct classification of 
the material, which usually is based on few morpho- 
logical characters only. Another problem is that gen- 
ebank accessions, even if they represent inbreeding 
crop species, often are genetically heterogeneous and 
may show residual heterozygosity. While this may 
reflect the original genetic state, e.g. of a landrace 
accession, it seriously can impair its molecular char- 
acterization and its subsequent use for research and 
breeding. Thus, most core collections are made up of 
accessions which underwent purification by single 
seed descent (SSD). 

Systematic phenotypic analysis of genebank col- 
lections is a time and resource intense effort which 
has been mainly restricted to agronomic traits 
that show a high heritability and can be assessed 
based on the per se performance of an accession. 
Therefore, most evaluation efforts were focused to 
combine i.e. disease resistance and important mor- 
phological characters (yield components) [8, 17]. 
Deep genetic and phenotypic characterization of 
genetic resources by HT techniques, including rese- 
quencing of enriched candidate genes and low- 
coverage full-genome resequencing will increasingly 
become available. Concomitantly large amounts of 
data need to be integrated within the current docu- 
mentation systems. Genebanks have to prepare for 
entering the genomics era by developing new strate- 
gies and novel information tools to assess the genetic 
diversity represented in their collections. Although 
there have been some successful examples of extract- 
ing useful genes from genebanks, the vast potential of 
this resource still remains largely untapped [18, 19]. 

CHARACTERIZATION OF 
GERMPLASM BY MOLECULAR 
MARKERS: THE CURRENT STATE 

A large series of studies have been undertaken to 
study diversity, domestication, evolution and phyl- 
ogeny of PGR, largely selected from genebank col- 
lections. Early studies considered morphological and 



cytogenetic characters. Various other techniques and 
molecular markers have been applied subsequently 
[20—23]. Until recently, amplified fragment length 
polymorphism (AFLP) or simple sequence repeats 
(SSR) were the molecular markers of choice for 
DNA fingerprinting of crop genomes [24—26]. 
Owing to their amenability to systematic develop- 
ment and HT detection, SNP markers increasingly 
applied to study genetic diversity in germplasm col- 
lections of up to several hundreds of accessions. 
Many of these collections have been established as 
association panels for linkage disequilibrium (LD) 
mapping, thus providing a first link between pheno- 
typic and genotypic data sets. The corresponding 
accessions have been selected from various germ- 
plasm sources or breeding programs to represent a 
rough cross section of the overall genetic diversity 
available for a given species or for an ecogeographical 
region [27, 28]. This is exemplified by a population 
comprising 224 spring barley accessions, which were 
selected from the Barley Core Collection, BCC [29] 
and complemented by additional accessions to cover 
the entire distribution range of this crop [30] . More 
recently, aboutl500 spring barley landraces adapted 
to temperate climate conditions were selected among 
22 093 Hordeum accessions of the Federal ex situ gen- 
ebank (IPK Gatersleben, Germany), based on their 
origin and morphology. The whole set has been 
genotyped by 43 SSR markers and analyzed for its 
genetic structure. While this is intended to usher in 
large-scale fingerprinting analysis of barley genebank 
accessions, the approach still falls short of providing 
informed molecular access to the entire collection. 
Different marker systems for genetic diversity studies 
and population parameters can be compared over a 
collection as recently shown by [31] who compared 
the performance of 42 SSR markers and 1536 SNP 
markers. The marker type of choice and the number 
of markers to be studied have to be adjusted for each 
species and project. 

Allele mining of individual loci 

Plant accessions from wild or locally adapted landrace 
genepools conserved in genebanks contain a rich 
repertoire of alleles that have been left behind 
by the selective processes of domestication, selection 
and cross-breeding that paved the way to today's 
elite cultivars. These resources stored in genebanks 
remain underexplored owing to a lack of effi- 
cient strategies to screen, isolate and transfer import- 
ant alleles. The most effective strategy for 
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determining allelic richness at a given locus is cur- 
rently to determine its DNA sequence in a represen- 
tative collection of individuals. Large-scale allele 
mining projects for germplasm collections at the mo- 
lecular level are needed as the one described for Pm3 
in wheat. Bhullar etal. [18] first selected a set of 1320 
bread wheat landraces from a virtual collection of 
16 089 accessions, using the focused identification 
of germplasm strategy (FIGS) and isolated seven 
new resistance alleles of the powdery mildew resist- 
ance gene Pm3. Similarly, a series of novel alleles 
have been detected for a recessive gene conferring 
virus resistance in barley [32, 33]. Further resequen- 
cing studies of candidate genes for agriculturally im- 
portant traits have been published, however, from 
smaller collections and mostly without functional 
characterization [34—40]. 

Resequencing of candidate genes using Sanger 
sequencing has been applied to study phylogenetic 
relationships of crop plants, their domestication, evo- 
lution, speciation and ecological adaptation. Early 
studies resequenced a single locus or few loci in 
only few individuals per species [41, 42]. Reduced 
costs for Sanger sequencing using capillary instru- 
ments and 9 6- well formats facilitated multilocus stu- 
dies in larger collections [43—51]. 

NGS technologies to screen germplasm 
collections 

Large-scale NGS is now possible using platforms 
such as Illumina/GA, Roche/GS FLX, Applied 
Biosystems/SOLiD and cPAL sequencing [52, 53]. 
The declining cost of generating such data is trans- 
forming all fields of genetics [54] . Many crop plant 
genomes are characterized by the vast abundance of 
repetitive DNA. For example, the genome of barley 
comprises >5 Gb of DNA sequence of which <2% 
can be accounted for by genes [55]. Therefore, to 
avoid excessive sequencing of putatively non-in- 
formative, repetitive DNAs, reduced-representation 
sequencing techniques have been developed to 
home in on subset of the genome for sequencing 
[56, 57]. When combined with techniques for label- 
ing reads (barcoding), DNA from many individuals 
can be analyzed in the same pooled sequencing re- 
action, and NGS provides an increasingly affordable 
means. These technologies are therefore becoming a 
standard choice for generating genetic data in fields 
such as population genetics, conservation genetics 
and molecular ecology. On the other hand, the 
deluge of sequence data they will entail the necessity 



to develop an appropriate IT infrastructure and new 
computational solutions [58-64]. 

Sequencing many individuals at low depth is an- 
other attractive strategy e.g. for complex trait associ- 
ation studies as shown by [65]. While detailed 
analysis of a single individual typically requires deep 
sequencing, resequencing of many individuals allows 
drastic reduction of sequencing depth when com- 
bined with efficient genotype imputation to match 
for missing data. Genotype imputation has been used 
widely in the analysis of genome-wide association 
studies (GWAS) to boost power and to facilitate 
the combination of results across different studies 
using meta-analyses [66, 67]. 

We have not yet reached the point at which rou- 
tine whole-genome resequencing of large numbers 
of crop plant genomes becomes feasible. Therefore, 
it is necessary to select genomic regions of interest 
and to enrich these regions before sequencing. 
Sequencing targeted regions of DNA (e.g. the 
exome or parts thereof) rather than complete gen- 
omes will be likely the preferred approach for most 
genomics applications including evolutionary biol- 
ogy, association mapping and biodiversity conserva- 
tion [68]. Sequencing targeted regions on massively 
parallel-sequencing instruments requires methods for 
concomitant enrichment of the templates to be 
sequenced. There are several enrichment approaches 
available, each with advantages and disadvantages 
[69-72]. Resequencing allows fingerprinting of 
many individuals without ascertainment bias which 
is inherent to some SNP marker systems [73—75]. 

As outlined above, targeted resequencing of 
hundreds of loci in genebank collections is already 
feasible. Yet, the costs for DNA extraction, com- 
plexity reduction and barcoding need to be brought 
down for systematic resequencing of genebank col- 
lections. In this context, large efforts have recently 
been made to automate protocols for massively 
parallel (re) sequencing and data analysis in order 
to match the increasing instrument throughput. 
These protocols that include e.g. large-scale auto- 
matic library preparation and size selection on 
robots [76] or fully automated construction of bar- 
coded libraries [77] — might be useful paving the way 
for automated NGS technologies to screen genebank 
collections [78]. 

Multiparallel resequencing studies 

Triggered by advancements in sequencing technol- 
ogies, several crop genome sequences have been 
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produced or are underway [79-82]. Once good 
quality levels have been achieved, these sequences 
will enable researchers to address all kinds of bio- 
logical questions or to link sequence diversity accur- 
ately to pheno types. 

Rapid developments in NGS will soon make 
whole-genome resequencing in several individuals 
or targeted resequencing of large germplasm collec- 
tions reality. This will help to eliminate an important 
difficulty in the estimation of LD and genetic rela- 
tionships between accessions obtained in bi-allelic 
geno typing studies caused by ascertainment bias i.e. 
the presence of rare alleles [73, 83—85]. 

Based on the available Arabidopsis thaliana (L.) 
Heynh. genome sequence, Weigel and Mott [86] 
advocated a 1001 Genomes project for Arabidopsis. 
Several Arabidopsis lines have been sequenced since 
[87, 88]. First studies on whole-genome resequen- 
cing in crop species have been published for rice and 
maize [66, 89, 90]. 

Combined genetic approaches for species, where 
a complete genome sequence and millions of 
SNPs are available, have been performed. Such 
approaches that include e.g. large-scale geno typing, 
targeted genomic enrichment, whole-genome rese- 
quencing and GWAS have been addressed to iden- 
tify allelic diversity, rare genetic variation, QTL and 
their functional characterization [91-96] or to iden- 
tify selective sweeps of favorable alleles and candidate 
mutations that have had a prominent role in domes- 
tication [97]. 

TRAIT MAPPING IN PLANTS 
Genome-wide marker discovery using 
NGS 

SNPs are the most abundant form of genetic vari- 
ation in eukaryotic genomes and are not a limiting 
factor anymore, also not for crop species with large 
genome sizes like barley [98]. SNP markers are rap- 
idly replacing SSRs or Diversity Arrays Technology 
(DArT) [99] markers because they are more abun- 
dant, reproducible, amenable to automation and 
increasingly cost-effective [100, 101]. SNP-based 
resources are presently being developed and made 
publicly available for broad application in crop re- 
search [102]. 

A high-quality genomic sequence as it is available 
for Arabidopsis and rice represents the ideal blueprint 
for resequencing and the identification of SNPs. 
But even for species with less complete genomic 



sequences such as barley and wheat [103, 104] or 
other species [105-109] NGS methods are valuable 
for genome-wide marker development, genotyping 
and targeted sequencing across the genomes of popu- 
lations [110—112]. These new methods — which in- 
clude e.g. reduced-representation libraries (RRLs) 
[113—115], complexity reduction of polymorphic 
sequences (CRoPS) [116, 117], restriction-site- 
associated DNA sequencing (RAD-seq) [118] and 
low-coverage sequencing for genotyping [119—121] 
are applicable for genetic analysis to non-model spe- 
cies, to species with high levels of repetitive DNA or 
to breeding germplasm with low levels of poly- 
morphism — without the need for prior sequence in- 
formation. These methods can be applied to 
compare SNP diversity within and between closely 
related plant species or within wild natural popula- 
tions [122, 123]. 

Genome-wide association studies in crop 
plants 

The systematic characterization and utilization of 
naturally occurring genetic variation has become an 
important approach in plant genome research and 
plant breeding. So far, linkage mapping based on 
bi-parental progenies has proven useful in detecting 
major genes and QTLs [124, 125]. Although this 
approach has been successful in many analyses, it 
suffers from several drawbacks. LD or association 
mapping is an attractive alternative to traditional 
linkage mapping and has several advantages over 
classical linkage mapping i.e. using unstructured 
populations that have been subjected to many 
recombination events [126-128]. GWAS in diverse 
germplasm collections offer new perspectives 
towards gene and allele discovery for traits of agri- 
cultural importance and dissecting the genetic basis 
of complex quantitative traits in plants [129, 130]. 
However, GWAS require a genome-wide assess- 
ment of genetic diversity (preferably based on a ref- 
erence genome sequence and resequenced parts 
thereof), patterns of population structure, and the 
decay of LD. For this, effective genotyping tech- 
niques for plants, high-density marker maps, pheno- 
typing resources, and if possible, a high-quality 
reference genome sequence is required [131]. The 
results of GWAS need in many cases confirmation 
by linkage analysis. 

GWAS have identified a large number of SNPs 
associated with disease phenotypes in humans, also 
in diverse worldwide populations [132]. Early 
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association mapping studies in crop plants were ham- 
pered by the availability of a limited amount of 
mapped markers and thus were mainly based on 
resequencing candidate genes [39, 40]. The develop- 
ment of comprehensive sets of SNP markers that can 
be interrogated in highly multiparallel HT SNP gen- 
otyping ushered in the era of germplasm diversity 
studies and GWAS in crop plants. [87, 98, 119, 
133-138]. 

For barley, few germplasm collections including 
wild and landrace barley have been genotyped using 
custom-made OPAs (oligo-pool assays) by Illumina 
GoldenGate technology [139, 140]. SNP markers 
significantly associated with traits are being used to 
identify genomic regions that harbor candidate 
genes for these traits in various collaborative barley 
projects. It is relatively easy to detect marker- trait 
associations in barley cultivar populations that 
have extensive LD (5— lOcM). Conversely, popula- 
tions with low LD are supposed to provide 



high-resolution associations (landraces, <5 cM; wild 
barley, <1 cM) but the number of markers needed to 
find significant associations is relatively high. This 
rapid decay in LD in populations of wild germplasm 
is a key generic problem with genotyping for 
bi-allelic SNPs. Furthermore, ascertainment bias of 
bi-allelic SNP discovery i.e. caused by rare alleles and 
alleles not present in the elite cultivars complicates 
the situation in landraces and wild germplasm [73, 
141]. Thus rare alleles are usually excluded from ana- 
lysis. Higher marker coverage is required in order to 
identify candidate genes more efficiently in diverse 
collections. In case of barley, a high density SNP 
Chip has been developed, which contains 7864 
bi-allelic SNPs coming from NGS of a broad range 
of barley cultivars (R. Waugh et al, unpublished 
data). Such customized arrays for HT SNP genotyp- 
ing can accelerate genetic gain in breeding programs. 
First barley association panels have been genotyped 
using this resource (Figure 2). Similar SNP chips are 



0.1 





Morex 



Winter barleys 



Figure 2: NeighborNet [I66] of Hamming distances for 6885 polymorphic SNPs among 27I barley cultivars using 
the 9K Infinium iSELECT HD custom genotyping Bead Chip. Barley cultivars Barke, Bowman and Morex are high- 
lighted as reference genotypes. Winter barleys form a cluster, which separates them clearly from the remaining 
spring barley accessions. 
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becoming available for an increasing number of crop 
plants [142, 143]. Combined studies using GWA 
mapping, comparative analysis, linkage mapping, 
resequencing and functional characterization of can- 
didate genes already enabled the identification of 
candidate genes for selected traits [66, 91, 128]. 

While genotyping arrays are useful for assessing 
population structure and the decay of LD across 
large numbers of samples, low-coverage whole- 
genome sequencing will become the genotyping 
method of choice for GWAS in plant species [66] . 
As for humans, GWAS for plants will become the 
primary approach for identifying haplotypes and 
genes with common alleles influencing complex 
traits. However, common variations identified by 
GWAS account for only a small fraction of trait her- 
itability and are unlikely to explain the majority of 
phenotypic variations of common traits. A potential 
source of the missing heritability is the contribution 
of rare alleles, insertion— deletion polymorphisms, 
copy number variants and epigenetic differences — 
that can be detected by NGS technologies. However, 
testing the association of rare variants with pheno- 
types of interest is challenging. Novel powerful asso- 
ciation methods designed for large-scale resequencing 
data have to be developed [144-149]. 

In the future, it can be expected that mapping 
by sequencing will become the method of choice 
to discover the genes underlying quantitative trait 
variation in large purified germplasm collections 
[150-152] or epigenetic variation [84, 88, 153-155]. 



OUTLOOK 

PGR of crop wild relatives or locally adapted crop 
landraces contain a rich repertoire of alleles that have 
been lost by selective processes that generated our 
today's elite cultivars. Such alleles represent an 
invaluable asset to cope with future challenges for 
sustainable agricultural development and food pro- 
duction [156, 157]. In the medium run, draft 
genome sequences will be available for all major 
and many neglected crops species and resequencing 
of these genomes in germplasm collections will yield 
a wealth of information. Transforming this deluge of 
data to information and knowledge will increase our 
understanding in all fields of genetics including evo- 
lution, ecology, domestication and breeding. Now is 
a crucial time to explore the potential implications of 
this information revolution for genebanks and to 
recognize opportunities and limitations in applying 



NGS tools and HT technologies to genebank col- 
lections [56, 158]. 

Sequence informed conservation and 
utilization of PGR 

The availability of sequence information can make a 
significant contribution to the conservation of PGR. 
The high degree of redundancy found between dif- 
ferent ex situ collections wastes a prohibitive amount 
of resources (see above). Across the board, two-third 
of the seed multiplication that is the most resource 
intense step of all conservation efforts, could be made 
redundant, if there were ways to unambiguously 
identify duplicates. Most attempts to identify dupli- 
cated samples suffered from the difficulty to agree on 
a common set of markers for a given species, mani- 
fold problems to reproduce DNA marker data be- 
tween different labs. DNA sequences do not suffer 
from such shortcomings and therefore represent an 
ideal information platform to tackle the issue of re- 
dundancy. Arguably, sequencing of ex situ collections 
just for the sake of eliminating redundancy would be 
too expensive an undertaking. Combination of this 
effort with one of the issues mentioned below could 
provide an added value. 

Clearly large crop collections cannot be sequenced 
in one draft. Against the backdrop of the evolving 
technology, a stepwise approach should be envisaged. 
Glaszmann et al [19] suggested the development of 
'core reference sets' for our crops. A core reference 
set (CRS) is to be understood as 'a set of genetic 
stocks that are representative of the genetic resources 
of the crop and are used by the scientific community 
as a reference for an integrated characterization of its 
biological diversity'. Every CRS will serve as a public, 
standardized and well characterized resource for the 
scientific community. Well characterized, multiplied, 
isolated CRS have to be maintained for reference 
purposes, comparative studies, future reanalysis and 
integrative genomic analysis [59]. 

For this, already existing core collections must 
be transformed into genetic stocks, purified (homo- 
geneous/stabilized) and taxonomically classified to 
facilitate practical choices for comparative associ- 
ation studies. One other approach is to select di- 
verse accessions directly from genebank collections 
based on all available pre-existing characterization 
and evaluation data (C&E), pedigree, origin and 
collection site information. Survey genotyping to 
test the purity of accessions can be done with vari- 
ous molecular marker types such as inter-simple 
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Figure 3: DNA genotyping and sequencing as integral components for conservation and valorization of plant gen- 
etic resources. 



sequence repeats (ISSRs) or AFLPs. Mixed acces- 
sions including more than one genotype have to 
be advanced by SSD before entering into system- 
atic molecular and phenotypic characterization 
(Figure 3). 

The scope of a genebank may be extended to that 
of a DNA bank, similar to biobanks devoted to target 
medical research [159]. The various implications of 
DNA banks for PGR have been discussed elsewhere. 
Common standards and Biobank Information 
Management Systems (BIMSs) have to be developed 
to deal with highly complex and diverse sets of meta- 
data. Advanced technologies for high-quality bio- 
sample storage and management systems are 
available and have to be implemented [160, 161]. 

Precise phenotyping is one of the major bottle- 
necks in characterizing large collections. New, 
non-invasive, automated image analysis technologies 
are currently under development for systematic phe- 
notyping under greenhouse and field conditions 
using novel sensing and imaging technologies. 
Phenomics is an emerging field, in which large and 
complex data sets are being produced. These require 
long-term storage for future reanalysis when software 
tools and algorithms have improved or for compara- 
tive analysis [162, 163]. Pre-selection of contrasting 
accessions by different strategies including allele 
mining approaches, genotyping using custom-made 
Bead Chips and morphological characterization are 
effective strategies to reduce the number of 



accessions prior to thorough phenotyping, the 
latter being the most time consuming step. 

The ultimate goal regarding the valorization of 
PGR will be the deployment of novel alleles that 
will improve the trait under consideration. While 
resequencing of candidate genes is a straightforward 
approach to identify allelic variation, deployment of 
novel alleles in a breeding program is contingent on 
prior phenotypic validation. So far, this has been re- 
stricted to major genes, e.g. for disease resistance and 
seed quality. Validation of alleles of candidate genes 
for quantitative traits still remains a major challenge 
(i.e. Targeting Induced Local Lesions in Genomes 
(TILLING)), [164, 165]. In this regard, the ability 
to replace alleles by site specific recombination 
could spur the targeted utilization of PGR and 
thus greatly enhance the value chain of Biodiversity. 



Key Points 

• Novel statistical approaches and promising NGS approaches are 
becoming available to screen major genebank collections. NGS 
will provide a platform for the large-scale development of SNPs 
that can be assayed in highly parallel manner for HT genotyping. 

• Alternatively to SNP analysis genotyping by sequencing will be 
employed to obtain information on SNP and haplotype patterns. 

• A staggered strategy starting from core collections is proposed 
to genotype and/or resequences genetic resources. 

• Leverage of the full potential of sequence information on PGR 
depends on the availability of accurate phenotypic information 
and the potential to validate novel alleles at the phenotypic level. 
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